Evolutionary Algorithms for Finding Interpretable Patterns in Gene Expression Data

نویسندگان

  • Carlos Cano
  • Armando Blanco
  • Fernando García
  • Francisco J. López
چکیده

Microarray Technology allows us to measure the expression of thousands of genes simultaneously, and under specific conditions. Clustering is the main tool used to analyze gene expression data obtained from microarray experiments. By grouping together genes with the same behavior across samples, resultant clusters suggest new functions for some of the genes. Non-exclusive clustering algorithms are required, as a gene may have more than one biological function. Gene Shaving (Hastie et al. 2000) is a clustering algorithm which looks for coherent clusters with high variance across samples, allowing clusters to overlap. In this paper we present two Evolutionary Algorithm approaches, based on Genetics Algorithms (GA) and Estimation of Distribution Algorithms (EDA), whose aim is to find clusters of similar genes with large between-sample variance. We apply our methods GA-Shaving and EDA-Shaving to S. cerevisiae cell cycle dataset outperforming Gene-Shaving results in terms of quality and size of obtained clusters. Furthermore, we use GO Term Finder (Boyle et al. 2004) to evaluate the biological interpretation of the results. It computes the most statistically significant biological processes associated to every cluster by means of the annotations of the Gene Ontology (Gene Ontology Consortium 2004).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS

In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...

متن کامل

Finding Similar Patterns in Microarray Data

In this paper we propose a clustering algorithm called sCluster for analysis of gene expression data based on pattern-similarity. The algorithm captures the tight clusters exhibiting strong similar expression patterns in Microarray data,and allows a high level of overlap among discovered clusters without completely grouping all genes like other algorithms. This reflects the biological fact that...

متن کامل

بررسی نقش عوامل مؤثر بر فراوانی حوادث در لوله‌های اصلی آب رسانی ‌با استفاده از الگوی رگرسیونی ترکیبی

A water distribution network is one of the important parts of infrastructure systems. The efficient management and proactive planning of capital investment of these assets are fundamental for efficient and effective service delivered by water companies. The direct economic costs (i.e. rehabilitation investment, repair costs, water loss, etc.) as well as indirect costs (i.e. service and traffic ...

متن کامل

Study of Evolutionary and Swarm Intelligent Techniques for Soccer Robot Path Planning

Finding an optimal path for a robot in a soccer field involves different parameters such as the positions of the robot, positions of the obstacles, etc. Due to simplicity and smoothness of Ferguson Spline, it has been employed for path planning between arbitrary points on the field in many research teams. In order to optimize the parameters of Ferguson Spline some evolutionary or intelligent al...

متن کامل

Optimization of sediment rating curve coefficients using evolutionary algorithms and unsupervised artificial neural network

Sediment rating curve (SRC) is a conventional and a common regression model in estimating suspended sediment load (SSL) of flow discharge. However, in most cases the data log-transformation in SRC models causing a bias which underestimates SSL prediction. In this study, using the daily stream flow and suspended sediment load data from Shalman hydrometric station on Shalmanroud River, Guilan Pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006